Nested Propositions in Open Information Extraction
نویسندگان
چکیده
The challenges of Machine Reading and Knowledge Extraction at a web scale require a system capable of extracting diverse information from large, heterogeneous corpora. The Open Information Extraction (OIE) paradigm aims at extracting assertions from large corpora without requiring a vocabulary or relation-specific training data. Most systems built on this paradigm extract binary relations from arbitrary sentences, ignoring the context under which the assertions are correct and complete. They lack the expressiveness needed to properly represent and extract complex assertions commonly found in the text. To address the lack of representation power, we propose NESTIE, which uses a nested representation to extract higher-order relations, and complex, interdependent assertions. Nesting the extracted propositions allows NESTIE to more accurately reflect the meaning of the original sentence. Our experimental study on real-world datasets suggests that NESTIE obtains comparable precision with better minimality and informativeness than existing approaches. NESTIE produces 1.7-1.8 times more minimal extractions and achieves 1.1-1.2 times higher informativeness than CLAUSIE.
منابع مشابه
An Overview of Open Information Extraction∗
Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We will intoduce the main properties of this extraction method. 1998 ACM Subject Classification Dummy classification – please refer to http://www.acm.org/ about/class/ccs98-html
متن کاملOpen Extraction of Fine-Grained Political Statements
Text data has recently been used as evidence in estimating the political ideologies of individuals, including political elites and social media users. While inferences about people are often the intrinsic quantity of interest, we draw inspiration from open information extraction to identify a new task: inferring the political import of propositions like OBAMA IS A SOCIALIST. We present several ...
متن کاملMultilingual Open Information Extraction
Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We propose a multilingual rule-based OIE method that takes as input dependency parses in the CoNLL-X format, identifies argument structures within the dependency parses, and extracts a set...
متن کاملOpen Knowledge Extraction through Compositional Language Processing
We present results for a system designed to perform Open Knowledge Extraction, based on a tradition of compositional language processing, as applied to a large collection of text derived from the Web. Evaluation through manual assessment shows that well-formed propositions of reasonable quality, representing general world knowledge, given in a logical form potentially usable for inference, may ...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کامل